对称性一直是探索广泛复杂系统的基本工具。在机器学习中,在模型和数据中都探索了对称性。在本文中,我们试图将模型家族架构引起的对称性与该家族的内部数据表示的对称性联系起来。我们通过计算一组基本的对称组来做到这一点,我们称它们称为模型的\ emph {Intertwiner组}。这些中的每一个都来自模型的特定非线性层,不同的非线性导致不同的对称组。这些组以模型的权重更改模型的权重,使模型所代表的基础函数保持恒定,但模型内部数据的内部表示可能会改变。我们通过一系列实验将Intertwiner组连接到模型的数据内部表示,这些实验在具有相同体系结构的模型之间探测隐藏状态之间的相似性。我们的工作表明,网络的对称性在该网络的数据表示中传播到对称性中,从而使我们更好地了解架构如何影响学习和预测过程。最后,我们推测,对于Relu网络,交织组可能会为在隐藏层而不是任意线性组合的激活基础上集中模型可解释性探索的共同实践提供理由。
translated by 谷歌翻译
在许多分类问题中,我们希望一个对一系列非语义转换具有强大的分类器。例如,无论其出现的方向和姿势如何,人都可以识别图片中的狗。存在实质性证据表明这种不变性可以显着提高机器学习模型的准确性和泛化。教导模型几何修正型的常用技术是通过变换输入来增加训练数据。但是,对于给定的分类任务期望需要哪种修正,并不总是已知的。确定有效的数据增强策略可以要求域专业知识或广泛的数据预处理。最近的努力,如自动化优化数据增强策略的参数化搜索空间,以自动化增强过程。虽然自动化和类似方法在几个常见的数据集上实现最先进的分类准确性,但它们仅限于学习一个数据增强策略。通常不同的类别或功能呼叫不同的几何修正。我们介绍了动态网络增强(DNA),从而了解输入条件增强策略。我们模型中的增强参数是神经网络的输出,并且随着网络权重被更新时被隐式学习。我们的模型允许动态增强策略,并在输入功能上具有几何变换的数据良好。
translated by 谷歌翻译
先进的制造技术使生产材料具有最先进的性质。然而,在许多情况下,这些技术的物理学模型的发展落后于实验室的使用。这意味着设计和运行实验在很大程度上通过试验和错误进行。这是次优,因为实验是成本 - ,时间和劳动密集型的。在这项工作中,我们提出了一种机器学习框架,差异属性分类(DPC),使实验者能够利用机器学习的无与伦比的模式匹配能力来追求数据驱动的实验设计。 DPC采用两种可能的实验参数集,并输出预测,其将产生具有由操作员指定的更可望的属性的材料。我们展示了DPC对AA7075管制造工艺和机械性能数据的成功,使用剪切辅助加工和挤出(形状),固相处理技术。我们表明,通过重点关注多个候选实验参数之间的选择,我们可以重新预测从处理参数预测材料属性的具有挑战性的回归任务,进入哪个机器学习模型可以实现良好性能的分类任务。
translated by 谷歌翻译
转移学习(TL)利用以前获得的知识有效地学习新任务,并且已被用于培训具有有限数量的数据的深度学习(DL)模型。当TL应用于DL时,佩带的预押(教师)模型是微调的,以构建特定域(学生)模型。这种微调依赖于DL模型可以分解到分类器和特征提取器,并且一系列研究表明,相同的特征提取器可用于培训多个任务上的分类器。此外,最近的研究提出了多种算法,可以进行微调教师模型的特征提取器,以更有效地培训学生模型。我们注意到,无论特征提取器的微调如何,学生模型的分类器都接受了特征提取器的最终输出(即倒数第二层的输出)。然而,最近的一项研究表明,跨层中的Resnet中的特征映射可能是在功能上等同的,提高要素提取器内的特征映射的可能性也可用于训练学生模型的分类器。灵感来自这项研究,我们测试了教师模型隐藏层中的特征映射,可用于提高学生模型的准确性(即,TL的效率)。具体而言,我们开发了“自适应传输学习(ATL)”,可以选择用于TL的最佳特征映射,并在几次拍摄的学习设置中测试。我们的实证评估表明,ATL可以帮助DL模型更有效地学习,特别是当可用示例有限时。
translated by 谷歌翻译
模型解释性的方法对于测试深度学习的公平性和健全性变得越来越重要。基于概念的可解释性技术使用了一系列人类解剖概念典范,以衡量概念对模型的内部输入表示的影响,这是这一研究中的重要线索。在这项工作中,我们表明,这些解释性方法可能会遭受对抗攻击的脆弱性,与他们要分析的模型相同。我们在两种著名的基于概念的可解释性方法上证明了这种现象:TCAV和刻面特征可视化。我们表明,通过仔细扰动正在研究的概念的示例,我们可以从根本上更改可解释性方法的输出。我们提出的攻击可以诱导积极的解释(对斑马进行分类时,圆点是模型的重要概念)或负面解释(条纹不是识别斑马图像的重要因素)。我们的工作强调了这样一个事实,即在安全至关重要的应用中,不仅需要机器学习管道,而且需要模型解释过程。
translated by 谷歌翻译
We introduce a novel framework to track multiple objects in overhead camera videos for airport checkpoint security scenarios where targets correspond to passengers and their baggage items. We propose a Self-Supervised Learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images. Our SSL approach improves object detection by employing a test-time data augmentation and a regression-based, rotation-invariant pseudo-label refinement technique. Our pseudo-label generation method provides multiple geometrically-transformed images as inputs to a Convolutional Neural Network (CNN), regresses the augmented detections generated by the network to reduce localization errors, and then clusters them using the mean-shift algorithm. The self-supervised detector model is used in a single-camera tracking algorithm to generate temporal identifiers for the targets. Our method also incorporates a multi-view trajectory association mechanism to maintain consistent temporal identifiers as passengers travel across camera views. An evaluation of detection, tracking, and association performances on videos obtained from multiple overhead cameras in a realistic airport checkpoint environment demonstrates the effectiveness of the proposed approach. Our results show that self-supervision improves object detection accuracy by up to $42\%$ without increasing the inference time of the model. Our multi-camera association method achieves up to $89\%$ multi-object tracking accuracy with an average computation time of less than $15$ ms.
translated by 谷歌翻译
The availability of frequent and cost-free satellite images is in growing demand in the research world. Such satellite constellations as Landsat 8 and Sentinel-2 provide a massive amount of valuable data daily. However, the discrepancy in the sensors' characteristics of these satellites makes it senseless to use a segmentation model trained on either dataset and applied to another, which is why domain adaptation techniques have recently become an active research area in remote sensing. In this paper, an experiment of domain adaptation through style-transferring is conducted using the HRSemI2I model to narrow the sensor discrepancy between Landsat 8 and Sentinel-2. This paper's main contribution is analyzing the expediency of that approach by comparing the results of segmentation using domain-adapted images with those without adaptation. The HRSemI2I model, adjusted to work with 6-band imagery, shows significant intersection-over-union performance improvement for both mean and per class metrics. A second contribution is providing different schemes of generalization between two label schemes - NALCMS 2015 and CORINE. The first scheme is standardization through higher-level land cover classes, and the second is through harmonization validation in the field.
translated by 谷歌翻译
When a human communicates with a machine using natural language on the web and online, how can it understand the human's intention and semantic context of their talk? This is an important AI task as it enables the machine to construct a sensible answer or perform a useful action for the human. Meaning is represented at the sentence level, identification of which is known as intent detection, and at the word level, a labelling task called slot filling. This dual-level joint task requires innovative thinking about natural language and deep learning network design, and as a result, many approaches and models have been proposed and applied. This tutorial will discuss how the joint task is set up and introduce Spoken Language Understanding/Natural Language Understanding (SLU/NLU) with Deep Learning techniques. We will cover the datasets, experiments and metrics used in the field. We will describe how the machine uses the latest NLP and Deep Learning techniques to address the joint task, including recurrent and attention-based Transformer networks and pre-trained models (e.g. BERT). We will then look in detail at a network that allows the two levels of the task, intent classification and slot filling, to interact to boost performance explicitly. We will do a code demonstration of a Python notebook for this model and attendees will have an opportunity to watch coding demo tasks on this joint NLU to further their understanding.
translated by 谷歌翻译
Recently, there has been increasing interest in synthesizing data to improve downstream text-to-SQL tasks. In this paper, we first examined the existing synthesized datasets and discovered that state-of-the-art text-to-SQL algorithms did not further improve on popular benchmarks when trained with augmented synthetic data. We observed two shortcomings: illogical synthetic SQL queries from independent column sampling and arbitrary table joins. To address these issues, we propose a novel synthesis framework that incorporates key relationships from schema, imposes strong typing, and conducts schema-distance-weighted column sampling. We also adopt an intermediate representation (IR) for the SQL-to-text task to further improve the quality of the generated natural language questions. When existing powerful semantic parsers are pre-finetuned on our high-quality synthesized data, our experiments show that these models have significant accuracy boosts on popular benchmarks, including new state-of-the-art performance on Spider.
translated by 谷歌翻译
Datacenter operators ensure fair and regular server maintenance by using automated processes to schedule maintenance jobs to complete within a strict time budget. Automating this scheduling problem is challenging because maintenance job duration varies based on both job type and hardware. While it is tempting to use prior machine learning techniques for predicting job duration, we find that the structure of the maintenance job scheduling problem creates a unique challenge. In particular, we show that prior machine learning methods that produce the lowest error predictions do not produce the best scheduling outcomes due to asymmetric costs. Specifically, underpredicting maintenance job duration has results in more servers being taken offline and longer server downtime than overpredicting maintenance job duration. The system cost of underprediction is much larger than that of overprediction. We present Acela, a machine learning system for predicting maintenance job duration, which uses quantile regression to bias duration predictions toward overprediction. We integrate Acela into a maintenance job scheduler and evaluate it on datasets from large-scale, production datacenters. Compared to machine learning based predictors from prior work, Acela reduces the number of servers that are taken offline by 1.87-4.28X, and reduces the server offline time by 1.40-2.80X.
translated by 谷歌翻译